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Abstract. Information-theoretic measures such as relative entropy and 
correlation are extremely useful when modeling or analyzing the inter¬ 
action of probabilistic systems. We survey the quantum generalization 
of 5 such measures and point out some of their commonalities and inter¬ 
pretations. In particular we find the application of information theory to 
distributional semantics useful. By modeling the distributional meaning 
of words as density operators rather than vectors, more of their semantic 
structure may be exploited. Furthermore, properties of and interactions 
between words such as ambiguity, similarity and entailment can be sim¬ 
ulated more richly and intuitively when using methods from quantum 
information theory. 


1 Introduction 

The notion of representing quantum states as density operators, rather than as 
kets, was introduced by von Neumann [I]. This transition brought with it an 
increased expressivity and complexity. While a ket can be understood to be a 
(joint) probability distribution, a density operator is a probability distribution 
over eigen kets. This additional dimension of structure ought to be exploited 
by various fields which involve uncertainty and interaction between multiple 
systems. Many of the measures developed in probability and information are 
applicable to density operators. Since a density operator is a distribution over 
kets rather than atomic symbols, these measures have been extended to take 
advantage of the added structure contained in the kets. 

One field in particular that promises to profit from quantum information the¬ 
ory is distributional semantics. This approach to meaning in human language 
makes use of Firth’s hypothesis that “a word is characterized by the company 
it keeps” [2]. Much work in computational linguistics has evolved from this hy¬ 
pothesis, in which the meanings of words are typically encoded in vectors mm- 
However, density operators have been used to model linguistic meaning in several 
existing works. Sordoni and Nie [5] perform information retrieval by encoding 
two types of a word’s context in a density operator. They leverage the strengths 
of language models and vector space models in a combined data structure. Balkir 
[51 generalizes the use of word vectors to density operators to take advantage of 
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its probabilistic interpretation. Piedeleu et al. [7] extend the tensor framework 
of [8] to apply to density operators. The result is a powerful theoretical model 
that compositionally derives the distributional meaning of a sentence, given its 
syntax tree and the lexical density operators. Blacoe |S] have another approach 
to representing the meaning of words and sentences. In their framework the 
density matrices for all words, phrases and sentences live in the same seman¬ 
tic space. This makes the derivation of rich sentence representations and their 
information-theoretical analysis tractable. 

In Section 2 we survey several information-theoretic measures and mathemat¬ 
ically examine the differences between them and their quantum analogues. In 
Section 3 we then observe how density matrices used in the field of distributional 
semantics have made use of quantum information theory. We also provide intu¬ 
itions about how the quantum generalizations of information-theoretic measures 
is advantageous to modeling human language processes. Section 4 concludes the 
paper. 

2 Quantum Generalizations 

In this section we compare aspects of classical probability theory with their 
quantum analogue. Generalising probability distributions to density operators 
has many advantages. While probability distributions are generally over atomic 
symbols, a density operator can be considered a probability distribution over 
eigen kets. Depending on the kets’ internal structure, they can encode a lot 
of information about some system. If the system is a complex of more than 
one subsystem, the kets have order > 1 and may encode correlation among the 
subsystems. The dimensions of the Hilbert spaces for each subsystem also allow 
for holding much information. Each eigen ket can therefore represent a joint 
probability distribution. If its values are complex values, of course we have an 
additional degree of freedom to represent information. 

Given the complexity of density operators, it is difficult to grasp the interac¬ 
tions among all of the information they contain. This section will illuminate as¬ 
pects of probability distributions, classical and quantum, by mathematically an¬ 
alyzing measures from probability theory and information theory. The quantum 
generalization of these measures has had much impact in the field of quantum 
information theory [lOj . We compare several such measures and find a pattern 
among their quantum generalizations. This will help us build intuitions about 
the gain that can come from rising to the quantum level. 

Throughout this section we will be using random variables A' and Y. More 
formally, let A be a random variable with possible outcomes xi,...,x n and 
corresponding probabilities p(xi), ...,p(x n ). Y has possible outcomes 
with corresponding probabilities p(y\), ...,p(y m ). Analogously to them we will 
use density matrices A and B. Let their respective spectral decomposition be 
A = Y1U oii\ai){ai\ and B = Y^j=i Pj\bj){bj\- This means that the a* are real, 
positive values that sum to 1, and the same goes for the f3j. Hence, they are the 
probabilities over their corresponding kets of unit norm |oq) and \bj). 
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Two functions of relevance here are and log(-). When applied to a density 
operator A they “transfer” to the operator’s eigen values. The unique positive 
square root of A is: 


VA = ^ 

i 


and the logarithm of A is: 


log A = ^log(a i )|a i )(a i | 

i 


(1) 

( 2 ) 


2.1 Von Neumann Entropy 

One fundamental quantum generalization is that of entropy. While the Shannon 
entropy for a random variable X is H{X ) = JT p{xi) log 2 p(xi), its quantum 
analogue is von Neumann entropy m- 

S(A) = Tr(A log 2 A) = <*i lo S 2 ( 3 ) 

i 

Quantum entropy underlies several of the information-theoretic measures that 
this section addresses. It is a measure of the information content of a quantum 
state. Whereas the entropy function is defined for only one probability distribu¬ 
tion, the following measures will be over pairs of probability distributions. 

2.2 Incompatibility 

At the heart of the difference between classical and quantum probability lies 
the matter of incompatibility. Two systems are incompatible whenever they are 
correlated and can therefore not be treated independently. Mathematically, this 
is the case when the systems’ operators A, B do not commute, that is AB ^ BA. 
Another way of making this obvious is to compare A’s and B’s eigen bases. If 
there is some one-to-one alignment between A’s and B’s eigen kets such that 
\a k ) = | b k ) for k = l,...,min(n, to) then we do not have a quantum effect. In 
this case the interaction of A and B reduces to the classical case where {cti}i 
and {(3j}j are regular probability distributions. However, if A’s and B’s eigen 
bases do not match up so neatly, then non-classical effects take place. In the 
following we will analyze how well-known measures for probability distributions 
are effected by the presence of this incompatibility. 

2.3 Measurement 

In the classical paradigm, a system’s properties are independent of the way in 
which they are observed. So if there is uncertainty about the state of a system, 
that uncertainty is not due to any aspect of the system. It is merely epistemic, 
that is, in the observer’s mind. We represent this uncertainty with the random 
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variable X. Let its possible states aq,..., x n be real values. Then the overall value 
of X , given the observer’s uncertainty, is its expected value: 

E(I) = ^x i xp(i i ) (4) 

i 

Quantum mechanics questions the realist perspective mentioned above and 
posits that uncertainty can also be ontological. That is to say, there are propo¬ 
sitions which have an uncertain value, independent of our (lack of) knowledge 
thereof. Let us represent our knowledge of the observed system by A and the 
mixed state of the system as B. We cast the situation of observing the system, 
given our limited knowledge, as a measurement scenario. Thus, A is regarded as 
the observable for measuring B , and the projective measures A, come from A’s 
eigen kets: Ai = |di)(ai|. The measurement’s outcome value will be one of A’s 
eigen values. 

However, we will not allow the measurement process to collapse. Von Neu¬ 
mann m splits the process into two steps: (1) measurement, which assigns each 
possible outcome state A t a probability ft, and (2) observation, which collapses 
the system’s outcome state to Ai with probability ft. If the second step never 
occurs, then the system is furthermore in a mixed state, also called statistical 
state, which is the sum of all Ai weighted by the corresponding probabilities ft. 
In a projective measurement of system B the probability ft for measuring A’s 
eigen value is: 


ft = (ai\B\a,i) = (a,| ^ = ^Z^j( a i\ b j)( b 3\ a i) ( 5 ) 

Therefore, if we forego the collapse, the statistical outcome value is 

ai X ft = Ui 5Z PA a A h i)( h A a i) = a iPj( a i\ b i)( b A a i) (6) 


2.4 Fidelity 

The Bhattacharyya coefficient is a useful means to compute the similarity of two 
probability distributions. It is defined as: 

BC(X, Y)=^2 \Jp{x.i)p(yi) (7) 

i 

Its minimum value is 0, and its maximum value 1 is reached if and only if A' = Y. 
For two probability distributions to be equal they must have the same amount of 
probabilities and the same values in the same position (marked by the index i). 
Any permutation of values in either distribution will lead to the Bhattacharyya 
coefficient to be less than 1. 
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The quantum analogue is the fidelity of two quantum states. This is a com¬ 
mon measure for measuring the similarity between two mixed states. It is defined 
as: 

F(A,B) =Tr(VAVB) = YWciiflj (ai\bj)(bj\ai) (8) 

= Y^ I Y VPj ( a i\ b j)( b i\ a i) 

i \ 3 

Again its value ranges from 0 to 1, with F(A , B) = 1 if and only if A = B. So, 
for this to happen, not only the eigen values need to align, but also A’s and B ’s 
eigen kets. This is obviously a classical case, as A and B must be diagonal in 
the same base in order to be equal. 

An important difference between BC and F is how the probabilities align. 
In the classical version, the closer the probabilities with the same index are to 
each other, the higher the Bhattacharyya coefficient. Fidelity makes no require¬ 
ments about indices being equal. As long as there is some one-to-one connection 
between A’s and B' s eigen kets via their inner product, the corresponding eigen 
values’ square roots will be multiplied and then summed over. 

2.5 Relative Entropy 

Another function over probability distributions is the Kullback-Leibler (KL) di¬ 
vergence. It measures how far one probability distribution diverges from another. 
Let a given system’s true probabilistic state be X, and let it be approximated 
by a distribution Y. The divergence Y from X can be thought of as the amount 
of additional information required to correct the errors made by this approxi¬ 
mation. The KL divergence of Y from X is defined as: 

d kl (x\\y) = Y^p{xi)\og 2 p(xi) - Y^p( x i) lo &2P(yi) = Y p ^ i° g 2 ^T~\ 

i i i ) 

(9) 

It is clear that this measure is not symmetric. Furthermore, it is closely tied to 
the entropy measure. As is the case with entropy, KL divergence lends itself well 
to quantum generalization [T2] . How much information is lost when describing 
a quantum state A using an approximated state representation B7 Again, this 
measure is related to entropy, and uses the same letter S: 

S{A\\B) = Tr(A log 2 A) - Tr(A log 2 B) 

= Y a i l0 S2 a i ~ ai l0 S2 ) ( a i\ b j)i b j\ a i) 
i i,j 

= Y a * lo & ai~Y a i I Y lo S 2 (Pj) (' ai\bj){bj\ai ) 

* i V o 


( 10 ) 
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Relative entropy is often used intuitively as a measure of distance between the 
states A and B. We are guaranteed by Jensen’s inequality |T3] that it is always 
non-negative. It is, however, not a metric since it is not even symmetric in its 
arguments. We see that it extends the KL divergence by bringing together all 
pairs of A’s and B’s eigen kets in an inner product. 

2.6 Common Theme 

Measurement, fidelity and relative entropy are information-theoretic measures 
involving (at least) two mixed operators. We observe that their extension of their 
classical analogue for probability distributions follows a pattern. It involves the 
term (ai\bj)(bj\ai), a contribution made possible by using density operators as 
representations. But what does this “quantum term” contribute? What use is 
this extension to us? We will identify theoretical advantages here, and in the 
following section apply them to the field of distributional semantics. 

As mentioned earlier, the quantum term has something to do with the align¬ 
ment and interaction of probabilities. In the classical paradigm, the alignment 
is straightforward. That is, the indices i and j collapse to one index, and the 
probabilities’ alignment is explicitly defined. We also mentioned that this one- 
to-one alignment is simulated by density operators whenever they commute, 
because then for each pair of eigen kets | at), \bj) the inner product is either 0 or 
1. Whenever ( at\bj) = 1 the corresponding probabilities cq and (3j interact. 

The interaction of the density operators’ eigen values uses varying functions 
for the different measures: In Section liO they get multiplied, in Section [2~T1 their 
square roots get multiplied, and in Section IHbl a.- gets multiplied with log 2J dj. 
This is true for the classical case as well as for the quantum case. However, when 
A and B do not commute, the value /?, is amended by the quantum term in a 
systematic way: 

In Section [27tl if A and B are incompatible the product a,; x /3j generalizes 
as follows: 


Pi ^ PA a i\ b i)( h i\ a i) (H) 

3 

In Section [2~T1 the product of square roots y/ai x y/^l is affected thus: 

VPi ^ ^VP~j{ a i\ b j)( b j\ a i) ( 12 ) 

3 

And in Section [53] the term a* x log 2 (A) undergoes this change: 

l°g 2 (A) Yl lo ^(Pj)( a i\ b j){ b j\ a i) ( 13 ) 

j 

Each generalization has in common that the function applied to /3j is now 
applied to all j3j which is then weighted by the quantum term. What is gained 
by this added term? How does this change the interaction between A’s and B’s 
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eigen values? We might say that the alignment of {a,}; and {/3 j}j has been 
very much relaxed. Due to the quantum term, all eigen kets interact, making 
it possible for all eigen values to interact. There is no longer a strict one-to- 
one correspondence of probabilities. One great advantage is that when designing 
density operators for states one does not need to manually pair up the states’ 
possible outcomes, or whatever it is the eigen kets encode. Cross-overs or “soft 
alignments” are now possible, making the interaction of the two distributions 
richer and more versatile. 

2.7 Entanglement 

We mention one further quantum generalization here: The classical correlation 
among (random) variables has an analogue called quantum correlation or entan¬ 
glement. 

Let p(xi,yj) be the joint distribution that jointly represents the distribu¬ 
tions from variables X and Y. If the two variables are correlated then this is 
measurable by their mutual information, which is defined as: 



(14) 


If p(xi,yj ) = p{xi) x p(yj ) for all i,j then p(xi,yj) is merely a “product dis¬ 
tribution” . In this case there is no correlation among X and Y and the mutual 
information is 0 (the logarithm’s value in Equation 1141 vanishes'). However, if X 
and Y are in the least correlated, then /(X; Y) is positive. 

The matter of correlation in quantum information theory has been the topic 
of much research [14] , It turns out that quantum correlation does not subsume 
classical correlation. Rather in different quantum states they may or may not 
coexist. In what follows we follow Werner’s [15] differentiation of the two different 
species of correlation. 

As with entropy, formulating a quantum analogue of mutual information is 
straightforward. If C is the density operator that describes a bipartite system 
in Hi ® H 2 , then the correlation among its two subsystems can be measured 
by C’s quantum mutual information. Let A and B be reductions of C via the 
partial trace: A = Tr-H 2 (C) and B = Tr-u^iC). If C = A 0 B is the product 
state of A and B then the subsystems are in no way correlated, that is they are 
independent. 

If C is not a product state, it may nevertheless be seperable. If this is the 
case then there are probabilities { 7 ;}j and matrices {A;}j and {B{\i such that 



(15) 


In such a situation it is possible for C to contain classical correlation, but no 
quantum correlation. 

Beyond independence and separability, another possibility is entanglement, 
which C contains if it is outside of the set of all separable sets. Entangled states 



may still contain classical correlation. }16] use relative entropy to make clear the 
difference between different kinds of (possibly simultaneous) correlation: They 
call total correlation the sum of classical correlation and quantum correlation. 
This is defined analogously to classical mutual information using the function 
for von Neumann entropy: 


Corr total = S(A) + 5(B) - S(C ) = S{C\\A ® B) (16) 

This indicates that the total correlation between subsystems A and B can be ex¬ 
pressed in terms of relative entropy. The information that the two systems share 
in their combined state C is the amount of information lost when approximating 
them as the product of their reduced states. In other words, their correlation is 
the “distance” between their current state and their totally uncorrelated state. 

fl6] divide the total correlation into two parts using the set D = {D m } m of 
all separable states. Entanglement is the additional amount of correlation found 
in states for being outside of D. Therefore C’s entanglement can be quantified 
as its distance to the set D } which is the distance of C to the “nearest” separable 
state using relative entropy as distance measure: 

Corrq uan t U m = *^(^1 |B m ) (17) 

The classical correlation in C is whatever is left over from the total correla¬ 
tion: 


COrr classical — Corrtotal Corr quantum (lb) 

As before, the additional correlation among quantum subsystems is made 
possible through representing them as mixed states. Due to the interaction 
among eigen kets, the density operators’ eigen values can interact in a way that 
goes beyond the correlation among two regular probability distributions. 

3 Applicability in Linguistics 

This section goes through the functions mentioned in Section [2] and applies them 
to distributional semantics. The works mentioned in Section jT| which use density 
operators for linguistic tasks use these functions to compute values relevant to 
natural language technology, such as the degree of similarity and entailment. 
First we review an example of how a density operator representing a word’s 
meaning is derived from a corpus of text. Then we use the ambiguous example 
words book and schedule to illustrate their interpretation and amenability to 
quantum information theory. 

3.1 An Example 

The following is an example of generating lexical density matrices in the style of 
[9]. They scower through a large corpus of lemmatized and dependency-parsed 
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text documents, counting for each target word how often it occurs with certain 
contexts. In distributional semantics it is common to encode the (relative) fre¬ 
quency of context words near the target word in a vector. However, [9] define 
contexts to be syntactical neighborhoods, that is the set of all words in the sen¬ 
tence connected directly to the target word via some dependency relation. For 
each type of relation Ri there is a vector space = C d whose dimensions en¬ 
code the d most common words under those relations. The overall Hilbert space 
is then the tensor product H = V/j x ® ... ® Vn n . A ket | Wi, docj) G R represents 
the (relative) frequencies of the word Wi occurring with common dependency 
neighborhoods in document docj of the corpus. 

Density matrices wi are created from these “document kets” as follows: 


Yjj | Wi,doCj)(w i: docj | 
TrQTh \w i ,doc j )(w i ,doc j \) 


(19) 


As the document kets are superpositions of contexts, summing over their outer 
products in this way causes wi, to have its own eigen base. If two documents use 
Wi in the same sense, then their information will cluster in the same eigen ket 
of wi. Otherwise their distributional information will likely end up in different 
eigen kets. 

The division by the trace in Equation [19] causes wi to be normalized. Hence 
its eigen values sum to 1 and can be interpreted as probabilities. The spectrum 
of wi gives us a probability distribution over kets, each of which encodes a sense 
of the word Wi. This is a powerful data structure, as each sense of Wi is not 
represented by a symbol, but rather by a multipartite ket which can be analyzed 
even further. 

Even though the other models mentioned above differ from that of [9] in 
several ways, they share a common idea: A word’s meaning is represented by 
a density operator, the spectral decomposition of which reveals its senses and 
their relative frequency. In the following subsections we will put this idea into 
the context of quantum information theory and explain the application of the 
measures covered in Section [2] to the semantics of words. 

In the following subsections we consider the example words book and schedule 
as an illustration of our information-theoretical analysis. Both words can be used 
as nouns or as verbs, book as a noun has at least two senses: (1) the book of Isaiah, 
(2) a collection of playing cards is a book according to the game’s rules. When 
used as a verb, book can mean (1) to book a venue or a ticket, or (2) to book 
someone for a crime they committed. The noun schedule is a calendar or planner. 
The verb schedule is closely related, as it is the action performed by recording 
or planning a time and a place for some activity. 

3.2 Ambiguity 

Natural language is riddled with ambiguity. Words can have many kinds of ambi¬ 
guity, including polysemy and homonymy. The former is the case when a word’s 
senses are related. The latter results from a development of two unrelated words 
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over time ending up with the same written and/or spoken form. In many dic¬ 
tionaries a word’s various senses have separate entries specifying their part of 
speech and meaning in context. One measure of ambiguity is, therefore, the 
amount of dictionary entries a word has. Dictionaries often only list the most 
prevalent senses, though. Furthermore, if one sense is less common than another 
it should weigh less into the word’s degree of ambiguity. This concern is natu¬ 
rally addressed by the entropy measure. With the entropy function, each sense’s 
information content is weighted by it’s probability and summed over. In our 
example we did not assign probabilities to the senses of book and schedule , but 
intuitively we can assume that the former has a higher entropy because it has 
more senses than the latter, each sense having a considerable probability mass. 

[7 use entropy to compute the ambiguity of words and phrases. They show 
through examples of adjective-noun modification and through nouns modified 
by relative clauses that this compositional process reduces the noun’s entropy. 
In the same vein m demonstrated that the entropy of a sentence is generally 
lower than that of its words. 


3.3 Language Understanding 

When we hear a word in isolation its intended meaning could be any of its senses. 
However, this rarely happens in human communication. Other words serve to 
disambiguate it, especially those heard previously to the target word. The more 
recently another word was heard, the stronger is its potential effect on coming 
words. This is a type of priming. That is, recent words build an expectation 
for following words. They build the context in which the next words will be 
evaluated. If, for example, schedule was heard shortly before the utterance of 
book , then it (and potentially other intervening words) would have an effect on 
how the hearer decides which sense of book was intended by the speaker. 

This incremental composition can be cast in terms of measurement as in 
Section 12.31 The previous words including schedule constitute the state of the 
current system, that is our knowledge of the speaker’s intended meaning. It is a 
mixed state, given the different senses of schedule. When the hearer encounters 
the word book , she compares all of its senses to the existing state of knowledge. 
Each sense ket of book is assigned a probability through this comparison. We 
leave as an open question whether measuring the book observable collapses the 
system’s state to one of &oofc’s eigen kets or not during a cognitive process such 
as this (see Section l2~TTll . 


3.4 Similarity 

What makes two words synonymous? Most words have multiple senses, even 
if the only differ through some semantic or syntactic feature. In order for two 
words to be synonyms, they would have to be interchangeable in every possible 
context. That is, they would have to have the same inventory of senses. When 
considering the disambiguation and composition of word meaning as above, an 
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additional criterion is that the probabilities for each sense would have to be the 
exact same in both words. 

It is virtually impossible to find two words that have all senses in common 
and with equal probability. We therefore need a similarity measure that takes 
differences in both of these two dimensions into account. Quantum fidelity is 
sensitive to both and renders a useful similarity value between 0 and 1 [6] . The 
fidelity function can figure out which senses of one word align to which degree 
with which senses of another word. A sense of one word may overlap somewhat 
with several senses of another word. In the case of our example words, the verbal 
sense of schedule overlaps highly with the first verbal sense of book. All other 
pairs of sense kets will have very little overlap at most, thus preventing the 
corresponding probabilities from contributing to the overall fidelity. 


3.5 Entailment 

A word w\ entails another word W 2 if w\ is a special case of W 2 - This relation 
becomes apparent in distributional semantics when wq’s typical contexts are 
a subset of typical contexts. In other words, w\ is always replaceable by 
u> 2 , but not necessarily vice versa. This intuition has been captured by varying 
measures in the semantics literature [18j . Additionally, relative entropy among 
quantum states has been suggested as an assymetrical similarity measure among 
words [5]. The relative entropy |uq) can be thought of as the amount of 
information that w\ lacks in order to cover as wide a meaning spectrum as W 2 - 


3.6 Correlation 

In our examplary creation of lexical density operators fSection l3.ll) we mentioned 
one way of making use of a multipartite Hilbert system: Each subsystem Vp i 
represents a type of syntactic relation i?.,;. This means that ui’s eigen kets repre¬ 
sent neighborhoods of context words around the word w. In HZj we are shown 
that these eigen kets contain correlation among their subsystems. For example, 
the two verbal senses of book have different typical neighbors: “The customer 
booked a ticket” vs. “The police booked a thief ”. The correlation inside book is 
among its arguments, customer is correlated with ticket , and police is correlated 
with thief. There would be no correlation among the subject and object subsys¬ 
tems if, for example, it were common and unambiguous to say “The customer 
booked a thief” and “The police booked a ticket”. 

The correlation measure used in m does not analyze how much of the found 
correlation is classical and how much is quantum. The book example here shows 
that subjects and objects are not independent. However, this example only ex¬ 
hibits classical correlation. Since the pairs customer-ticket and police-thief are 
in different senses of book this phenomenon can be modelled with a separable 
density operator. 

Non-seperability comes with there being correlation inside at least one eigen 
ket. This is an additional piece of structure that may well find linguistic applica- 
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tion. But it will take further investigation to identify specific ways of exploiting 
this advantage. 

4 Conclusions 

We have surveyed several measures from probability and information theory 
and their quantum analogues. While the classical versions of these measures 
apply to regular probability distributions, their quantum versions are for density 
operators. Measurement, fidelity and relative entropy are functions over (at least) 
two density operators A, B which describe their interaction. We found that a 
common theme among these functions is the term that takes the inner product 
of all pairs of A’s and B ’s eigen kets into account. Thereby all possible pairs of 
(eigen) probability values from A and B are able to effect the process outcome, 
whereas in classical distributions only those probabilities aligned one-to-one by 
an index are processed together. 

This “soft alignment” of probabilities is potentially advantageous for any 
application model involving uncertainty and the interaction of distributions. We 
gave some intuitions as to how the field of distributional semantics may benefit 
from quantum information theory. We identified several existing works which 
already take advantage of density matrices and made suggestions about further 
use thereof. In particular the generalization of information-theoretic measures 
brings with it added possibilities for modeling complex interactions among words 
and phrases, or whatever objects are represented by density operators. 
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